Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

tbu- · 2018-06-29T20:47:24Z

If one doesn't view integers as containers of bytes, converting them to
bytes necessarily needs the specfication of encoding.

I think Rust is a language that wants to be explicit. The to_bytes
function is basically the opposite of that – it converts an integer into
the native byte representation, but there's no mention (in the function
name) of it being very much platform dependent. Therefore, I think it
would be better to replace that method by three methods, the explicit
to_ne_bytes ("native endian") which does the same thing and
to_{le,be}_bytes which return the little- resp. big-endian encoding.

rust-highfive · 2018-06-29T20:47:28Z

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

tbu- · 2018-06-29T20:49:19Z

I think these methods also make more sense than the to_be and to_le function from a type-system point of view: It doesn't make sense to repeatedly call to_be on an integer, but you can, the same way that it doesn't make sense to repeatedly call encode() on a string in Python 2, but you can do so there (which will raise an error for non-ASCII, but still).

tbu- · 2018-06-29T20:52:08Z

CC #49792

kennytm · 2018-06-30T06:36:25Z

src/libcore/num/mod.rs

+        ///
+        /// [`to_native_bytes`]: #method.to_native_bytes
+        #[rustc_deprecated(since = "1.29.0", reason = "method doesn't communicate
+            intent, use `to_native_bytes`, `to_le_bytes` or `to_be_bytes` instead")]
        #[stable(feature = "int_to_from_bytes", since = "1.29.0")]


If this PR is accepted, we should better keep to_bytes()/from_bytes() unstable or just remove them, instead of stabilize and immediately deprecate them.

alexreg · 2018-07-01T21:29:06Z

Very fair point! Three different method classes makes sense.

felix91gr · 2018-07-02T16:04:08Z

Hi! I don't know if I'm supposed to add my 2 cents here, but since I can, I will.

I've been using the to_bytes kind of transformation without regards for the endianness. This is because this is what my program is doing:

Receive an N-Dimensional Vec<f32>
Take its (non-randomly-seeded) u64 hash
Take the bytes of that hash
Feed those bytes to a pseudorandom generator as its seed.

I'm using this to make a generalized N-D Perlin (and in the future, Simplex) noise library. Since I'm using the hash's bytes as the seed for the pseudorandom generator, I didn't care about endianness when I implemented it.

But now that I think of it, specifying the endianness would allow me to have the exact same noise patterns regardless of the platform in which the library is running. So I think, even in my case, this is a better feature than the recently merged to_bytes :)

Thanks for your hard work 🙇‍♂️

tbu- · 2018-07-05T07:42:31Z

Python implements a similar API on integers:

 |  from_bytes(...) from builtins.type
 |      int.from_bytes(bytes, byteorder, *, signed=False) -> int
 |      
 |      Return the integer represented by the given array of bytes.
 |      
 |      The bytes argument must be a bytes-like object (e.g. bytes or bytearray).
 |      
 |      The byteorder argument determines the byte order used to represent the
 |      integer.  If byteorder is 'big', the most significant byte is at the
 |      beginning of the byte array.  If byteorder is 'little', the most
 |      significant byte is at the end of the byte array.  To request the native
 |      byte order of the host system, use `sys.byteorder' as the byte order value.
 |      
 |      The signed keyword-only argument indicates whether two's complement is
 |      used to represent the integer.
 |  
 |  to_bytes(...)
 |      int.to_bytes(length, byteorder, *, signed=False) -> bytes
 |      
 |      Return an array of bytes representing an integer.
 |      
 |      The integer is represented using length bytes.  An OverflowError is
 |      raised if the integer is not representable with the given number of
 |      bytes.
 |      
 |      The byteorder argument determines the byte order used to represent the
 |      integer.  If byteorder is 'big', the most significant byte is at the
 |      beginning of the byte array.  If byteorder is 'little', the most
 |      significant byte is at the end of the byte array.  To request the native
 |      byte order of the host system, use `sys.byteorder' as the byte order value.
 |      
 |      The signed keyword-only argument determines whether two's complement is
 |      used to represent the integer.  If signed is False and a negative integer
 |      is given, an OverflowError is raised.

(Note that the byteorder argument is mandatory.)

alexcrichton · 2018-07-12T18:06:43Z

This was discussed briefly at @rust-lang/libs triage yesterday but the conclusion was that we're going to leave comments individually here if we feel so.

I personally would not want to merge this PR. I think that leveraging the already-stable to_le and to_be methods to work in combination with to_bytes and from_bytes (already implemented) is the best way to work with these methods. Given that to_le and to_be are already stable I wouldn't feel we need to stabilize to_le_bytes or to_be_bytes. The question then to me is whether we need to include "native" in the name. I feel that the fact that these are native-endianness is implied by the lack of other information in the method name as otherwise we'd specify big or little.

alexcrichton · 2018-07-12T18:07:05Z

@rfcbot fcp close

Let's see what others think though!

rfcbot · 2018-07-12T18:07:05Z

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

tbu- · 2018-07-13T18:22:48Z

The question then to me is whether we need to include "native" in the name. I feel that the fact that these are native-endianness is implied by the lack of other information in the method name as otherwise we'd specify big or little.

As you can see in this thread, the implied native isn't clear for everyone. In fact, I think one should actively think about whether one wants a big-endian, little-endian or native byte representation for the integer. Having to_le and to_be on the integers does not achieve this, as you can just omit calls to them and never notice. If you instead have the choice between to_le_bytes, to_be_bytes, to_native_bytes, it is very clear that you have to make a choice, and it's immediately documented, too. If you see to_bytes in the wild, you wouldn't really know whether the author intended to make the code platform-specific or just forgot to add to_be or to_le.

SimonSapin · 2018-07-13T18:22:53Z

This alternative was discussed in #49792 and I mildly preferred the current design in order to avoid the multiplication of API surface. This PR proposes replacing two methods with six.

I don’t really mind changing, especially since to_bytes and from_bytes were only stabilized in this cycle (1.29) so we can still remove them. However the motivation based only on principle feels a bit weak.

@tbu-, do you feel that this change would prevent bugs? What do you think of the "intermediate" option of renaming the existing methods but not adding the be/le ones? (Keep documenting them as to be combined with to_le and to_be.)

tbu- · 2018-07-13T18:34:14Z

@tbu-, do you feel that this change would prevent bugs?

Yes, this is the intention behind this change. As you can see above, we already caught one with the PR alone. :)

What do you think of the "intermediate" option of renaming the existing methods but not adding the be/le ones?

I think that would be worse than what we currently have, as to_be().to_native_bytes() looks wrong, as we're combining a big-endian function with a function operating on native endian.

I think the optimization based on number of functions is misguided.

In C, zero-terminated "strings" share the same type, whether they're encoded as UTF-8 or just treated as bag of bytes. Rust makes the distinction because it is useful to prevent errors, it makes sense to make the user cast between bytes and UTF-8-strings, even if this results in an enormous method duplication between byte slices and strings.

As shown with the Python example, other languages consider the endianness of the integers part of the encoding process, too.

alexcrichton · 2018-07-13T20:05:44Z

I think I'm personally just not sold on this. I see the cost of six functions, some of which duplicate the functionality of existing functions, as not worth it. Dealing with endianness is inherently tricky and I don't feel like making all the methods longer and more wordy will really reduce the trickiness, but rather just make it less ergonomic to call when you already know which one you'd like.

tbu- · 2018-07-14T08:44:28Z

I see the cost of six functions, some of which duplicate the functionality of existing functions, as not worth it.

I understand this part.

don't feel like making all the methods longer and more wordy will really reduce the trickiness, but rather just make it less ergonomic to call when you already know which one you'd like.

I don't really understand this part. It's

to_be().to_bytes()              → to_be_bytes()            [reduction by  5 chars]
to_le().to_bytes()              → to_le_bytes()            [reduction by  5 chars]
to_bytes()                      → to_native_bytes()        [increase  by  7 chars]
i32::from_be(i32::from_bytes()) → i32::from_be_bytes()     [reduction by 10 chars]
i32::from_le(i32::from_bytes()) → i32::from_le_bytes()     [reduction by 10 chars]
i32::from_bytes()               → i32::from_native_bytes() [increase  by  7 chars]

To me, it looks like it decreases the ergonomics of doing the platform-specific thing slightly but also increases the ergonomics of doing the platform-independent thing slightly, and the important part: It brings them to the same level. Currently, doing the platform-independent thing is slightly less ergonomic.

Rust wants to encourage good code, if we want to bring ergonomics of the methods into play, then I think an even playing field between platform-independent methods and platform-dependent methods is in line with Rust's objectives.

SimonSapin · 2018-07-15T10:06:40Z

I feel that "native" in a name doesn’t really work on its own. It only sort of does in "native endian", and even then only by opposition to little/big endian.

What do you all think of transmute_to_bytes and transmute_from_bytes, to express that this is a type-level change without a memory representation change?

Also the more I think about it the more I’m becoming partial to having explicit {to,from}_{le,be}_bytes conversions. The {to,from}_{le,be} methods have the merit of already being stable, but they’re kinda weird in taking and returning the same type. In fact the implementation of to_be and from_be for example is identical! I understand that they’re intended to be thought about differently, and perhaps their use is self-documenting what direction of conversion is intended, but it feels like the type system would be better suited to this (even though adding a whole series of dedicated types like struct BigEndianU32(u32); is probably not worth it).

alexcrichton · 2018-07-17T21:13:45Z

@tbu- to me ergonomics isn't literally how many characters you type but also how easy it is to remember idioms, and remembering that there's endian conversions on integers plus byte<-> int conversions means it's easy to remember how to compose. With everything it can be confusing about knowing which one was the one implemented and there's now a choice to be made of which to do in various circumstances.

@SimonSapin I think we could try to find a better name, yeah, but I'd reserve transmute for unsafe-like operations and would prefer to not have it here on a safe method.

marshrayms · 2018-07-18T02:14:27Z

I am today starting on my first Rust program. I need to read a binary file format. When reading and writing files or network data, the endianness is an inherent part of the format's or protocol's definition.

It seems far more complicated to me to first read an incorrect intermediate value using host endianness and then adjust it to the correct value with bit manipulations. Byte-swapping is a completely unnecessary complication when we can just read it correctly in the first place.

I would really appreciate it if conversions of the form 'u32::from_le_bytes([u8; 4])' were made available.

SimonSapin · 2018-08-03T18:15:32Z

This looks great, thanks! Sorry for one last hoop to jump through, could you change the tracking issue in #[unstable] attributes to #52963 ? r=me with that.

The old issue has already been in FCP, a new issue was opened for the new API.

tbu- · 2018-08-04T06:41:24Z

@SimonSapin Done.

SimonSapin · 2018-08-04T06:54:58Z

@bors r+

Thanks!

bors · 2018-08-04T06:54:59Z

📌 Commit 0ddfae5 has been approved by SimonSapin

…Sapin Provide `{to,from}_{ne,le,be}_bytes` functions on integers If one doesn't view integers as containers of bytes, converting them to bytes necessarily needs the specfication of encoding. I think Rust is a language that wants to be explicit. The `to_bytes` function is basically the opposite of that – it converts an integer into the native byte representation, but there's no mention (in the function name) of it being very much platform dependent. Therefore, I think it would be better to replace that method by three methods, the explicit `to_ne_bytes` ("native endian") which does the same thing and `to_{le,be}_bytes` which return the little- resp. big-endian encoding.

Rollup of 14 pull requests Successful merges: - #51919 (Provide `{to,from}_{ne,le,be}_bytes` functions on integers) - #52940 (Align 6-week cycle check with beta promotion instead of stable release.) - #52968 (App-lint-cability) - #52969 (rustbuild: fix local_rebuild) - #52995 (Remove unnecessary local in await! generator) - #52996 (RELEASES.md: fix the `hash_map::Entry::or_default` link) - #53001 (privacy: Fix an ICE in `path_is_private_type`) - #53003 (Stabilize --color and --error-format options in rustdoc) - #53022 (volatile operations docs: clarify that this does not help wrt. concurrency) - #53024 (Specify reentrancy gurantees of `Once::call_once`) - #53041 (Fix invalid code css rule) - #53047 (Make entire row of doc search results clickable) - #53050 (Make left column of rustdoc search results narrower) - #53062 (Remove redundant field names in structs)

bors · 2018-08-04T18:22:30Z

⌛ Testing commit 0ddfae5 with merge 215bf3a...

bors · 2018-08-04T22:22:30Z

💥 Test timed out

Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

`{to,from}_{ne,le,be}_bytes` for unsigned integer types Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

gamozolabs · 2018-08-27T13:35:42Z

I would really like to see this be put into a trait rather than functions.

As a trait this could be implemented by a builtin derive which would make it work for structures composed of primitive types like this is being implemented on currently.

This would allow for simple creation of packed (or C structures by ignoring padding) structures in Rust (like an IP header or binary file header) and then the automatic conversion to and from the raw bytes on the wire/file.

I have a library to do this now and I use it in many of my projects. I think the ability to be able to handle reading and writing packed structures safely from bytestreams is critical enough to a systems language to justify it being put into the core language rather than being an external library. Especially given it would be almost no code increase to the Rust base (would be a procmacro), and would have no implications to programs unless they decide to use the feature.

BurntSushi · 2018-08-27T14:23:03Z

@gamozolabs I think it would be appropriate to publish that library on crates.io, garner adoption and prove out the API. At that point, someone could consider writing an RFC, but I think it's probably immature to jump to that right now.

* Fix typos and unused lifetimes * Remove duplicated cast * As rust-lang/rust#51919 changes function name, use to_ne_bytes instead. * Remove stabled feature flag * Update toolchain to latest version

jonhoo · 2019-01-17T23:04:14Z

@BurntSushi will this affect byteorder at all?

BurntSushi · 2019-01-17T23:17:56Z

@jonhoo That's a pretty broad question! Nothing immediately pops out at me. byteorder should continue to work fine?

jonhoo · 2019-01-17T23:20:05Z

Hehe, that's true. I guess I was more wondering whether byteorder will transition to use these methods instead of having its own implementation?

BurntSushi · 2019-01-17T23:23:24Z

Hehe, that's true. I guess I was more wondering whether byteorder will transition to use these methods instead of having its own implementation?

Oh! Hah. That didn't cross my mind at all because byteorder is one of the most conservative crates I maintain, and I won't increase its MSRV for something like this.

rust-highfive assigned bluss Jun 29, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 29, 2018

kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 30, 2018

kennytm reviewed Jun 30, 2018

View reviewed changes

kennytm added the I-nominated label Jul 4, 2018

TimNN added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 10, 2018

alexcrichton added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed I-nominated S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jul 12, 2018

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Jul 12, 2018

pietroalbini added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jul 23, 2018

Change tracking issue from rust-lang#49792 to rust-lang#51919

0ddfae5

The old issue has already been in FCP, a new issue was opened for the new API.

kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Aug 4, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 4, 2018

kennytm mentioned this pull request Aug 4, 2018

Rollup of 14 pull requests #53056

Merged

This comment has been minimized.

Sign in to view

bors merged commit 0ddfae5 into rust-lang:master Aug 4, 2018

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 4, 2018

SimonSapin added a commit to SimonSapin/rust that referenced this pull request Aug 14, 2018

{to,from}_{ne,le,be}_bytes for unsigned integer types

8defd15

Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

SimonSapin mentioned this pull request Aug 14, 2018

{to,from}_{ne,le,be}_bytes for unsigned integer types #53358

Merged

vjackson725 mentioned this pull request Aug 18, 2018

Update to new to/from_bytes functions rust-lang/libm#146

Closed

zhzy0077 mentioned this pull request Sep 18, 2018

Remove rust-toolchain and fix some typos cswinter/LocustDB#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

tbu- commented Jun 29, 2018 •

edited

Loading

rust-highfive commented Jun 29, 2018

tbu- commented Jun 29, 2018

tbu- commented Jun 29, 2018

kennytm Jun 30, 2018

alexreg commented Jul 1, 2018

felix91gr commented Jul 2, 2018 •

edited

Loading

tbu- commented Jul 5, 2018

alexcrichton commented Jul 12, 2018

alexcrichton commented Jul 12, 2018

rfcbot commented Jul 12, 2018 •

edited

Loading

tbu- commented Jul 13, 2018

SimonSapin commented Jul 13, 2018

tbu- commented Jul 13, 2018

alexcrichton commented Jul 13, 2018

tbu- commented Jul 14, 2018 •

edited

Loading

SimonSapin commented Jul 15, 2018

alexcrichton commented Jul 17, 2018

marshrayms commented Jul 18, 2018

SimonSapin commented Aug 3, 2018

tbu- commented Aug 4, 2018

SimonSapin commented Aug 4, 2018

bors commented Aug 4, 2018

This comment has been minimized.

bors commented Aug 4, 2018

bors commented Aug 4, 2018

gamozolabs commented Aug 27, 2018

BurntSushi commented Aug 27, 2018

jonhoo commented Jan 17, 2019

BurntSushi commented Jan 17, 2019

jonhoo commented Jan 17, 2019

BurntSushi commented Jan 17, 2019

Provide {to,from}_{ne,le,be}_bytes functions on integers #51919

Provide {to,from}_{ne,le,be}_bytes functions on integers #51919

Conversation

tbu- commented Jun 29, 2018 • edited Loading

rust-highfive commented Jun 29, 2018

tbu- commented Jun 29, 2018

tbu- commented Jun 29, 2018

kennytm Jun 30, 2018

Choose a reason for hiding this comment

alexreg commented Jul 1, 2018

felix91gr commented Jul 2, 2018 • edited Loading

tbu- commented Jul 5, 2018

alexcrichton commented Jul 12, 2018

alexcrichton commented Jul 12, 2018

rfcbot commented Jul 12, 2018 • edited Loading

tbu- commented Jul 13, 2018

SimonSapin commented Jul 13, 2018

tbu- commented Jul 13, 2018

alexcrichton commented Jul 13, 2018

tbu- commented Jul 14, 2018 • edited Loading

SimonSapin commented Jul 15, 2018

alexcrichton commented Jul 17, 2018

marshrayms commented Jul 18, 2018

SimonSapin commented Aug 3, 2018

tbu- commented Aug 4, 2018

SimonSapin commented Aug 4, 2018

bors commented Aug 4, 2018

This comment has been minimized.

bors commented Aug 4, 2018

bors commented Aug 4, 2018

gamozolabs commented Aug 27, 2018

BurntSushi commented Aug 27, 2018

jonhoo commented Jan 17, 2019

BurntSushi commented Jan 17, 2019

jonhoo commented Jan 17, 2019

BurntSushi commented Jan 17, 2019

Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

Provide `{to,from}_{ne,le,be}_bytes` functions on integers #51919

tbu- commented Jun 29, 2018 •

edited

Loading

felix91gr commented Jul 2, 2018 •

edited

Loading

rfcbot commented Jul 12, 2018 •

edited

Loading

tbu- commented Jul 14, 2018 •

edited

Loading